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ABSTRACT 

In this paper, we present a novel information theoretic approach 
to image segmentation. We cast the segmentation problem as the 
maximization of the mutual information between the region labels 
and the image pixel intensities, subject to a constraint on the to- 
tal length of the region boundaries. We assume that the probability 
densities associated with the image pixel intensities within each re- 
gion are completely unknown a priori, and we formulate the prob- 
lem based on nonparametric density estimates. Due to the nonpara- 
metric structure, our method does not require the image regions to 
have a particular type of probability distribution, and does not re- 
quire the extraction and use of a particular statistic. We solve the 
information-theoretic optimization problem by deriving the asso- 
ciated gradient flows and applying curve evolution techniques. We 
use fast level set methods to implement the resulting evolution. The 
evolution equations are based on nonparametric statistics, and have 
an intuitive appeal. The experimental results based on both syn- 
thetic and real images demonstrate that the proposed technique can 
solve a variety of challenging image segmentation problems. 

1. INTRODUCTION 

Image segmentation has been an important problem in image anal- 
ysis with applications to pattern recognition, object detection, and 
medical image analysis. Thus, there has been a considerable amount 
of work on image segmentation including those using curve evolu- 
tion techniques [2,3,4,7,8, 10, II, 14, 15]. For example, Paragios 
et al. [8] developed a parametric model for analysis and segmenta- 
tion of textured images. Yezzi et al. [15] developed a segmentation 
technique using a particular discriminative statistical feature such 
as the mean or the variance of image regions. These and many other 
recent works (such as [1 1]) have been inspired by the region com- 
petition model of Zhu and Yuille [16]. 

In all the work mentioned above, the typical statistical model 
for the underlying image was in a parametric form. However, this 
parametric approach is not robust in the sense that its performance 
is severely affected when the parametric model is not correct. 

In response to the need for robustness in statistical analysis, 
nonparametric methods [9] have been widely used in machine learn- 
ing problems. Nonparametric methods estimate the underlying dis- 
tributions from the data without any assumptions about the struc- 
tures of the distributions. On the other hand, mutual information 
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has been used as a tool to solve a variety of problems such as MR- 
CT image registration [13], 3-D pose alignment [12], and measur- 
ing global and local spatial correspondence [1]. 

In this paper, we propose a novel approach to image segmen- 
tation. Here we focus on images with two regions, but the method 
can be generalized to multi-region images. We segment a given im- 
age into the foreground and the background by evolving a closed 
curve with curve length penalty so that we maximize the nonpara- 
metric estimate of the mutual information between the binary (fore- 
ground region inside the curve/ background region outside the curve) 
label determined by the curve and the image pixel intensity. The 
resulting curve evolution formula involves a nonparametric likeli- 
hood ratio and other terms explaining the change of density esti- 
mates due to the evolution of the curve. To compute the density 
estimates, we use the fast calculation methods proposed in [6]. 

The remainder of this paper is organized as follows. Section 2 
presents the novel information theoretic objective functional for im- 
age segmentation. Section 3 then derives our curve evolution-based 
approach to minimizing this objective functional. We then present 
experimental results in Section 4, using both synthetic and real im- 
ages. Finally, we conclude in Section 5 with a summary. 

2. INFORMATION THEORETIC APPROACH TO IMAGE 
SEGMENTATION 

In this section, we state the problem, the assumptions, and present 
our information theoretic segmentation criterion. 

2.1. Image Model 

The image model we are dealing with has two unknown regions 
Ri and R2 with the associated unknown distributions pi and p2- 
The image intensity at pixel x denoted by I(x) is drawn from pi 
if x 6 R\ and from p2 if x 6 Rz- The left-hand side of Figure 1 
illustrates the image model. 

The goal of curve evolution is to move the curve C such that 
it matches the boundary between R\ and R2, i.e. the region inside 
the curve R and the region outside the curve R? converge to #1 
and R2 respectively. 

2.2. Mutual Information between Image Intensity and the La- 
bel 

We present an information theoretic energy functional based on mu- 
tual information between a binary region label and the intensity val- 
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Fig. 1. Left: Illustration of the foreground region (R\), the back- 
ground region (#2), and the associated distributions (pi and p-i). 
Right: Illustration of the curve (C), the region inside the curve (i?), 
and the region outside the curve (R?). 



ues of an image. We define the binary label determined by the curve 
C as a mapping from the image domain Q to {F, B} denoted by 
L : Q -» {F, B} as follows: 



L(x) 



-{ 



if a: € R 
if as € R c 



(1) 



Let A be a random variable which is uniformly distributed over 
the image domain H, then L(X) becomes a binary random variable 
taking value F or B with probability j^j and respectively, 
where | - |, the cardinality of a set, is given by the area of the set. 
Note thatL(A) conveys information about the image intensity I(X) 
at a random location via X. The mutual information between the 
image intensity at X and the label at X is given formally as fol- 
lows: 

I(I(X);L(X)) 

= h(I(X)) - h(I(X)\L(X)) 
= h(I(X)) - Pr(L(X) = F)h(I(X)\L(X) = F) 
- Pr(L(X) = B)h(I(X)\L(X) = B), (2) 

where the differential entropy [5] of a continuous random variable 
Z with a support S is defined by 



h(Z) = - J pz(z)\ogp z {z)dz 



(3) 



2.3. Utility of Mutual Information as a Segmentation Statistic 

Since I(X), X, L(X) form a Markov chain, by the data processing 
inequality, 



I(I(X);L(X)) < /(/(A); A), 



(4) 



where equality holds if and only if I(X), L(X), X form a Markov 
chain, i.e., I(X) and X are conditionally independent given L(X). 
If L(-) is not the correct segmentation, then knowing L(X) is not 
enough to determine whether the distribution of I(X) is pi orp2, 
and thus I(X) is not independent of A". Therefore, I(I(X)\L(X)) 
is maximized if and only if L( ) gives the correct segmentation. 

2.4. The Energy Functional 

Since I(I(X); L(X)) is a functional of the unknown densities p\ 
andp2, we need to estimate the mutual information: 

1(I(X),L(X)) 

= h(I(X)) - Pt(L{X) = F)h{I(X)\L(X) = F) 
- Pr(L(X) = B)h(I(X)\L(X) = B) (5) 



We combine the mutual information estimate with the typical reg- 
ularization penalizing the length of curve. This regularization pre- 
vents the formation of fractal segmenting curves. The resulting en- 
ergy functional to minimize is then given by 



E(C) = -i^iX^LiX^ + a^ds, 



(6) 



where §g ds is the length of the curve and a is a scalar parameter. 

3. NONPARAMETRIC DENSITY ESTIMATION AND 
GRADIENT FLOWS 

This section derives the curve evolution formula for minimizing 
the energy functional (6) using nonparametric Parzen density es- 
timates. 

3.1. Estimation of the Differential Entropy 

The expression (5) involves differential entropy estimates and we 
use nonparametric Parzen density estimates in order to estimate the 
differential entropies. 

Since h(I(X)) is independent of the curve or the label, we just 
consider h(I(X)\L{X) = F) and h{I{X)\L(X) = B), which 
are given as follows: 



(7) 



h(I(X)\L(X) = F) 

= -j^j J^og p R {I(x))dx 

= -ili/« l0S (^i//^ (x) - /( ^) dX ' (8) 

where (7) is an approximation of the entropy using weak law of 
large numbers, and (8) uses a continuous version of the Parzen den- 

dcf 

sity estimate [9] of pr = Pi(x)\l(X)=f- In (8), the kernel is 
K(z) = . 1 x e~27?, where a is a scalar parameter. Similarly, 



h(I(X)\L(X) = B) 

= -i^L iog feL^ (/(x) - /(x))dx ) dx(9) 

3.2. Gradient Flows for General Nested Region Integrals 

Note that (8) and (9) have nested region integrals. For a general 
nested region integral of the form 



/ /(e(x, t))dx where e(x,t) = / (/(x,x)dx, (1 
J r J R 



0) 



we have derived the gradient flow (the negative of the gradient), 
which is given by 



dC 

dt 



/(*«?))+ / /'(e(x))</(x,C)dx 

J R 



N, (11) 



where iV is the outward unit normal vector. 
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3.3. The Gradient Flows for the Information Theoretic Energy 
Functional 

Now based on (1 1), (8) and (9), the gradient flow for E(C) of (6) 
is obtained as follows: 



dC 
dt 



[log-^S) + 



\R\L 



JT(J(x) - 1(C)) 
p H . (/(<?)) WJr ftU(x)) 



m L 



7v(/(x)-/(C)) 
P„«(/(x)) 



c/x]iV - aK/V, 



where n is the curvature of the curve and ~anN is the gradient 
flow for the curve length penalty, whose derivation can be found 
in [14]. 

The first term of this gradient flow is a likelihood ratio test which 
compares the hypotheses that the observed image intensity 1(C) at 
a given point on the active contour C belongs to the foreground re- 
gion R or the background region Ft? based upon the current esti- 
mates of the distributions p R and p RC . The second and third terms 
respond to the changes incurred on the distributions p R and p RC by 
moving a given point on the active contour. 

These last two terms distinguish this active contour model from 
those obtained using coordinate descent, in which alternating iter- 
ations of estimating the distribution parameters inside and outside 
the curve are followed by likelihood ratio tests to evolve the curve 
as in the "Region Competition" algorithm of Zhu and Yuille [16]. 
In such algorithms, changes in the distributions are not directly cou- 
pled with likelihood ratio tests. In contrast, the mathematical struc- 
ture of our nonparametric estimators are built directly into the curve 
evolution equation through the last two terms. 

Since the evaluation of the density estimate at each pixel takes 
0(# of pixels) time, calculation of the gradient flow takes 
0((# of pixels) 2 ) time. We reduced the computational complexity 
to 0(# of pixels) time using the fast Gauss transform [6] in calcu- 
lating the density estimates. 

4. EXPERIMENTAL RESULTS 
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Fig. 2. Evolution of the curve on a synthetic image: the different 
mean case. 

We present experimental results on synthetic images and a real im- 
age of a leopard. Three synthetic images are generated by three sets 
of distributions: two Gaussian distributions with different means, 
two Gaussian distributions with different variances, and two distri- 
butions with the same mean and the same variance. 

Figure 2 shows the result for the first case, where the two distri- 
butions for the foreground and the background have different means 
and the same variance. 





variance case. 
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(a) image (b) with boundary (c) pi 



Fig. 4. Example image with two regions (boundaries marked in 
(b)), where the foreground has a bimodal density pi, and the back- 
ground has a unimodal density p2. The two densities p\ and p2 
have the same mean and the same variance. 



Figure 3 shows the result for the second case, where the two 
distributions for the foreground and the background have different 
variances and the same mean. 

For these two cases, the method of Yezzi et al. [15] would re- 
quire the selection of the appropriate statistic a priori, whereas our 
method does not. 

Now we consider a more challenging image shown in Figure 4(a). 
The two underlying distributions are illustrated in Figure 4(c) and 
Figure 4(d). Since the two distributions have the same mean and 
same variance, it is hard even for a human observer to separate the 
foreground from the background. In order to let readers see the 
foreground, we show the actual boundaries by a curve in Figure 4(b). 
For this kind of image, the methods based on means and variances 
such as that proposed by Yezzi et al. [15] would no longer work. 

Figure 5 shows our segmentation results. As shown in Figure 5(a), 
we have used an automatic initialization with multiple seeds. For 
this kind of image, using a simple initialization as in the examples 
of Figure 2 and Figure 3 leads to a large number of iterations, and 
in some cases the curve may get stuck in a local optimum. The 
power of the multiple-seed initialization is that it observes entire 
regions and the evolution of the curve occurs globally. Figure 5(b) 
and Figure 5(c) show the intermediate stages of the evolution, where 
the seeds in the background region gradually shrink at each itera- 
tion whereas those in the foreground region grow. Figure 5(d) gives 
the segmentation result. 

We now report the result for a leopard image, which is similar 
to the case of bimodal versus unimodal density example of Figure 5. 
Figure 6(d) shows the segmentation result. The final curve cap- 
tures the main body of the leopard and some parts of its tail and 
legs. The parts of the tail and the legs that are missing look similar 
to the background, which makes a perfect segmentation difficult. 
Paragios et al. [8] performed a similar experiment on a leopard im- 
age. Their supervised texture segmentation algorithm requires an 
image patch taken from the leopard and an image patch taken from 
the background in advance as an input to the algorithm. It is no- 
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(a) initial (b) intermediate 




(c) intermediate (d) final 



Fig. 5. Evolution of the curve on a synthetic image: bimodal versus 
unimodal densities. 




(c) intermediate (d) final 



Fig. 6. Evolution of the curve on a leopard image. 

ticeable that our method, which is unsupervised, can segment this 
complex image as accurately as their supervised algorithm. 

5. CONCLUSION 

We have developed a new information theoretic image segmenta- 
tion method based on nonparametric statistics and curve evolution. 
We have formulated the segmentation problem as one of maximiz- 
ing the mutual information between the region labels and the pixel 
intensities, subject to curve length constraints. We have derived the 
curve evolution equations for the optimization problem posed in 
our framework. Due to the nonparametric aspect of our formula- 
tion, the proposed technique can automatically deal with a variety 
of segmentation problems, in which many currently available curve 
evolution-based techniques would either completely fail or at least 
require the a priori extraction of representative statistics for each re- 
gion. We use fast techniques for the implementation of curve evo- 
lution and nonparametic estimation, which keep the computational 
complexity at a reasonable level. Our preliminary experimental re- 



sults have shown the strength of the proposed technique in accu- 
rately segmenting real and synthetic images. 

We have recently extended our method to problems involving 
more than two regions. Our current work involves use of spatially 
dependent probability density functions for accurate texture mod- 
eling and segmentation. 
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